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■ We consider the delay of network coding compared to routing with retransmissions in packet erasure networks with probabilistic 

1 

erasures. We investigate the sub-linear term in the block delay required for unicasting n packets and show that there is an unbounded 
gap between network coding and routing. In particular, we show that delay benefit of network coding scales at least as y/n. Our 
analysis of the delay function for the routing strategy involves a major technical challenge of computing the expectation of the 
maximum of two negative binomial random variables. This problem has been studied previously and we derive the first exact 
characterization which may be of independent interest. We also use a martingale bounded differences argument to show that the 
actual coding delay is tightly concentrated around its expectation. 
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Index Terms 

Block delay, network coding, packet erasure correction, retransmission, unicast. 



I. Introduction 

This paper considers the block delay for unicasting a file consisting of n packets over a packet erasure network with 



■ probabilistic erasures. Such networks have been extensively studied from the standpoint of capacity. Various schemes involving 
& | coding or retransmissions have been shown to be capacity-achieving for unicasting in networks with packet erasures, e.g. 0], 
10, 0, 0|. For a capacity-achieving strategy, the expected block delay for transmitting n packets is |=j + D(n) where C is the 
minimum cut capacity and the delay function D(n) is sublinear in n but differs in general for different strategies. In general 
networks, the optimal D{n) is achieved by random linear network coding, in that decoding succeeds with high probability for 
any realization of packet erasure events for which the corresponding minimum cut capacity is tJJ]. However, relatively little 
has been known previously about the behavior of the delay function D(n) for coding or retransmission strategies. 

In this paper, we analyze the delay function D(n) for random linear network coding (coding for short) as well as an 
uncoded hop-by-hop retransmission strategy (routing for short) where only one copy of each packet is kept in intermediate 
node buffers. Schemes such as |5), El ensure that there is only one copy of each packet in the network; without substantial 

The material of this paper was presented in part at the IEEE International Symposium on Information Theory 2009 and the IEEE Information Theory 
Workshop 2010. 

1 The field size and packet length are assumed in this paper to be sufficiently large so that the probability of rank-deficient choices of coding coefficients 
can be neglected, along with the fractional overhead of specifying the random coding vectors. 
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non-local coordination or feedback, it is complicated for an uncoded topology-independent scheme to keep track of multiple 
copies of packets at intermediate nodes and prevent capacity loss from duplicate packet transmissions. We also assume that 
the routing strategy fixes how many packets will traverse each route a priori based on link statistics, without adjusting to link 
erasure realizations. While routing strategies could dynamically re-route packets under atypical realizations, this would not 
be practical if the min-cut links are far from the source. On the other hand, network coding allows redundant packets to be 
transmitted efficiently in a topology-independent manner, without feedback or coordination, except for an acknowledgment from 
the destination when it has received the entire file. As such, network coding can fully exploit variable link realizations. These 
differences result in a coding advantage in delay function D(n) which, as we will show, can be unbounded with increasing n. 

A major technical challenge in the analysis of the delay function for the routing strategy involves computing the expectation 
of the maximum of two independent negative binomial random variables. This problem has been previously studied in (6], 
where authors explain in detail why it is complicatec!^! and derive an approximate solution to the problem. Our analysis addresses 
this open problem by finding an exact expression and showing that it grows to infinity at least as the square root of n. 

Related work on queuing delay in uncoded Q, (8] and coded (9) systems has considered the case of random arrivals and 
their results pertain to the delay of individual packets in steady state. This differs from our work which considers the delay 
for communicating a fixed size batch of n packets that are initially present at the source. 



A. Main results 

For a line network, the capacity is given by the worst link. We show a finite bound on the delay function that applies to 
both coding and the routing scheme when there is a single worst link. 

Theorem 1. Consider n packets communicated through a line network of £ links with erasure probabilities Pi,p%, • . • ,pi 
where there is a unique worst link: 

p m := max pi, pi < p m < 1 Mi ^ m. 

l<i<£ 

The expected time ET n to send all n packets either with coding or routing is: 

Ti 

ET„=- 1- D(n,p u p 2 ,. ■ ■ ,pi), (1) 

1 — max pi 
i<i<e 

where the delay function D(n,pi,p2, ■ ■ ■ ,pe) is non-decreasing in n and upper bounded by: 

£ 

D(jpx,p 2 ,...,pt>) := V — — — . 

. f"i Pm - Pi 



If on the other hand there are two links that take the worst value, then the delay function is not bounded but still exhibits 
sublinear behavior. Pakzad et al. ifTUl show that in the case of a line network with identical links, the optimal delay function 

2 Authors in |6| deal with the expected maximum of any number of negative binomial distributions but the difficulty remains even for two negative binomial 
distributions. 



grows as s/n. This is achieved by both coding and the routing strategy 

In contrast, for parallel path networks, we show that the delay function behaves quite differently for the coded and uncoded 
schemes. 

Theorem 2. The expected time EX£ taken to send n packets using coding over a k-parallel path multi-hop network is 

TI 

ET° = h D c 



k — > max pa 



%—\ 

where the delay function D c n depends on all the erasure probabilities pij, for i G {1, . . . , k}, 1 < j < t. In the case where 
there is single worst link in each path D c n is bounded, i.e. D c n G 0(1) whereas if there are multiple worst links in at least 
one path then D c n G 0(^/n). The result holds regardless of any statistical dependence between erasure processes on different 
paths. 

Theorem 3. The expected time ET^ taken to send n packets through a k-parallel path network by routing is 

n 

m = — i + K (2) 

i—l 

where the delay function D r n depends on all the erasure probabilities p^, for i G {1, . . . , k}, 1 < j < £ and grows at least as 
y/n, i.e. D' n G 0(^/n). 

The above results on parallel path networks generalize to arbitrary topologies. We define single-bottleneck networks as 
networks that have a single min-cut. 

Theorem 4. In a network of erasure channels with a single source S and a single receiver T the expected time ET^' taken 
to send n packets by routing is 

TI a 

ET r = — + D r 

where C is the capacity of the network and D r n G fl(y^n). In the case of network coding the expected time ET T 'j taken to send 
n packets is 

TI 

ET r = — +D r 

ALJ - L n (J 

where D c n G 0(1) for single-bottleneck networks. 

We also prove the following concentration result: 
Theorem 5. The time for n packets to be transmitted from a source to a sink over a network of erasure channels using 



3 The result in 1101 is derived for the routing strategy which is delay-optimal in a line network; as discussed above, coding in a sufficiently large field is 
delay-optimal in any network. 



network coding is concentrated around its expected value with high probability. In particular for sufficiently large n: 

2(7 f 1 \ 

P[|T^-ET^|>e„]<— + o - , (3) 



where C is the capacity of the network and e n represents the corresponding deviation and is equal to e n = n 1 / 2+<5 JC, 
6 e (0,1/2). 

Since ET£ grows linearly in n and the deviations e„ are sublinear, T° is tightly concentrated around its expectation for large 
n with probability approaching one. Subsequent to our initial conference publications ifTTl . Ifl2l . further results on delay for 
line networks have been obtained by ifPTl , Ifl4l . 

II. Model 

We consider a network Q = (V,£) where V denotes the set of nodes and £ = V x V denotes the set of edges or links. We 
assume a discrete time model, where at each time step each node v e V can transmit one packet on its outgoing edges. For 
every edge e 6 £ each transmission succeeds with probability 1 — p e or the transmitted packet gets erased with probability p e ; 
erasures across different edges and time steps are assumed to be independent. In our model, in case of a success the packet is 
assumed to be transmitted to the next node instantaneously, i.e. we ignore the transmission delay along the links. We assume 
that no edge fails with probability 1 (i.e. p e < 1 for all e € £) since in such a case we can remove that edge from the network. 

Within network Q there is a single source S E V that wishes to transmit n packets to a single destination T in Q. We 
investigate the expected time it takes for the n packets to be received by T under two transmission schemes, network coding 
and routing. When network coding is employed, each packet transmitted by a node v £ V is a random linear combination 
of all previously received packets at the node v. The destination node T decodes once it has received n linearly independent 
combinations of the initial packets. When routing is employed, the number of packets transmitted in each path is fixed ahead 
of the transmission, in such a way that the expected time for all n packets to reach destination T is minimized. 

All nodes in the network are assumed to have sufficiently large buffers to store the necessary number of packets to 
accommodate the transmission scheme. In the case of routing, we assume an automatic repeat request (ARQ) scheme with 
instantaneous feedback available on each hop. Thus, a node can drop a packet that has been successfully received by the next 
node. For the case of coding, as explained in lfT31 . information travels through the network in the form of innovative packets, 
where a packet is innovative for a node v if it is not in the linear span of packets previously received by v. For simplicity 
of analysis, we assume that a node can store up to n linearly independent packets; smaller buffers can be used in practical 
Feedback is not needed except when the destination T receives all the information and signals the end of transmission to all 
nodes. Our results hold without any restrictions on the number of packets n or the number of edges in the network, and there 
is no requirement for the network to reach steady state. 

III. Line Networks 

The line network under consideration is depicted in Figure Q] The network consists of I links L;, 1 < i < i and I + 1 
nodes Nj, < j < I. Node Nj, < j < I — 1 is connected to node Nj+\ to its right through the erasure link Lj + i, where 

4 By the results of [?], the buffer size needed for coding is no larger than that needed for routing. 
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Source Destination 

Fig. 1. Multi-hop line network 

we assume that the source S and the destination T are also defined as nodes Nq and Nt respectively. The probability of 
transmission failure on each link Li is denoted by pj. 

For the case of a line network there is no difference between network coding and routing in the expected time it takes to 
transmit a fixed number of packets. Note that coding at each hop (network coding) is needed to achieve minimum delay in 
the absence of feedback, whereas coding only at the source is suboptimal in terms of throughput and delay J2|- 

Proof of ' Theorem]!} By using the interchangeability result on service station from Weber |[T6l . we can interchange the 
position of any two links without affecting the departure process of node Nt-i and therefore the delay function. Consequently, 
we can interchange the worst link in the queue (which is unique from the assumptions of Theorem [TJ with the first link, and 
thus we will assume that the first link is the worst link (j)2,P3, ■ ■ ■ ,Pi < Pi < 1)- 

Note that in a line network, under coding the subspace spanned by all packets received so far at a node Ni contains that of 
its next hop node Ni+u similarly to the case of routing where the set of packets received at a node AT, is a superset of that 
of its next hop node iVt+i. Let the random variable Rf, 1 < i < £ — 1, denote the rank difference between node Ni and node 
ATj.fi, at the moment packet n arrives at Ni. This is exactly the number of packets present at node ATj that are innovative for 
N + i (which for brevity we refer to simply as innovative packets at node N in this proof) at the random time when packet 
n arrives at N\. For any realization of erasures, the evolution of the number of innovative packets at each node is the same 
under coding and routing. 

The time T n taken to send n packets from the source node S to the destination T can be expressed as the sum of time 
required for all the n packets to cross the first link and the time r„ required for all the remaining innovative packets 
Ri, . . . , i?™_i at nodes Ni, . . . , ATg_i respectively to reach the destination node T: 

T — + r 

All the quantities in the equation above are random variables and we want to compute their expected values. Due to the 
linearity of the expectation 

ET n = ET« + Er„ (4) 
and by defining 1 < j < n to be the time taken for packet j to cross the first link, we get: 

n 

H l - pi 

since 1 < j < n, are all geometric random variables (P [X^ = k J = (1 — pi) ■ p\ _1 , k > 1). Therefore combining 
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TABLE I 

The delay function D(n,pi,p 2 ) for different values of n 



D(n,Pi,P2) 
1 



1-P2 
2 1 



1— Pz I-P1P2 
l+p 2 (2- Pl (6-pi + (2-5pi)p 2 + (l-3(l-pi)pi)p|)) 

(1-P2)(1-P1P2) 3 

f 1 + P 2 (3 - Pi(ll + 4pfp* + p 2 (5 + (5 - p 2 )p 2 ) + pfp 2 (l - p 2 (5 + 2 p 2 (5 + 3p 2 ))) "1 
\ -PI (4 + P2(l= + P2(21 - (1 - P2)P2))) + Pl(l - P2(l - P 2 (31 + P2 (5 + 4p 2 )))))) J 
(1-P2)(1-P1P2) 5 



equations Q and © we get: 



ETi 1 



Er„. 



(6) 



Equations (Q3, © give 



D(n,pi,p 2 , ■ ■ ■ ,Pi) = Er„ 



which is the expected time taken for all the remaining innovative packets at nodes N±, . . . , A^_i to reach the destination. For 
the simplest case of a two-hop network (I = 2) we can derive recursive formulas for computing this expectation for each 
n. Table Hill has closed-form expressions for the delay function D(n,pi,p2) for n = 1, . . . , 4. It is seen that as n grows, 
the number of terms in the above expression increases rapidly, making these exact formulas impractical, and as expected for 
larger values of I (> 3) the situation only worsens. Our subsequent analysis derives tight upper bounds on the delay function 
D(n,pi,p2, ■ ■ ■ ,pe) for any £ which do not depend on n. 

The (I — l)-tuple Y n = (i?™, . . . , R^-i) representing the number of innovative packets remaining at nodes N\, . . . , Ng-i at 
the moment packet n arrives at node N\ (including packet n) is a multidimensional Markov process with state space £cN' _1 
(the state space is a proper subset of N t ~ 1 since Y n can never take the values (0, *, . . . , *) where the * represents any integer 
value). Using the coupling method [17] and an argument similar to the one given at Proposition 2 in |[T8l it can be shown 
that Y„ is a stochastically increasing function of n (meaning that as n increases there is a higher probability of having more 
innovative packets at nodes Ni, . . . , Nt-i). 

Proposition 1. The Markov process Y„ = (Rf, . . . , is di 

S f-iiicreusing. 

Proof: Given in Appendix lAl along with the necessary definitions. ■ 
A direct result of Proposition[T]is that the expected time taken Er„ for the remaining innovative packets at nodes N\, . . . , Nt-i 
to reach the destination is a non-decreasing function of n: 



Et„ < Ei-,,+1 < lim Er„ 

n—toc 



(7) 



where the second inequality is meaningful when the limit exists. 

Innovative packets travelling in the network from node Ni to the destination node T can be viewed as customers travelling 
through a network of service stations in tandem. Indeed, each innovative packet (customer) arrives at the first station (node 



Nx) with a geometric arrival process and the transmission (service) time is also geometrically distributed. Once an innovative 
packet has been transmitted (serviced) it leaves the current node (station) and arrives at the next node (station) waiting for its 
next transmission (service). 

It is helpful to assume the first link to be the worst one in order to use the results of Hsu and Burke in (T9). The authors 
proved that a tandem network with geometrically distributed service times and a geometric input process, reaches steady state 
as long as the input process is slower than any of the service times. Our line network is depicted in Figure Q] and the input 
process (of innovative packets) is the geometric arrival process at node Ni from the source S. Since pi , Pa, ■ . . , Pi < Pi the 
arrival process is slower than any service process (transmission of the innovative packet to the next hop) and therefore the 
network in Figure Q] reaches steady state. 

Sending an arbitrarily large number of packets (n — > oo) makes the problem of estimating lira Er r Jf|the same as calculating 

n— >oo 

the expected time taken to send all the remaining innovative packets at nodes Nx, ■ ■ ■ , iV^_i to reach the destination T at 
steady state. This is exactly the expected end-to-end delay for a single customer in a line network that has reached equilibrium. 
This quantity has been calculated in 11201 (page 67, Theorem 4.10) and is equal to 

£ 

limET„ = V— (8) 

pi - Pi 

Combining equations (|7]i and ^ and changing p\ to p m := max pi < 1 concludes the proof of Theorem Q] ■ 



IV. fc-PARALLEL PATH NETWORK 

We define the k-paraUel path network as the network depicted in Figure [2] This network consists of k parallel multi-hop 
line networks (paths) with k£ nodes and k£ links, with £ links in each path (our results are readily extended to networks 
with different number of links in each path). Each node Nuj_x) is connected to the node iVy on its right by a link Lij, for 
i£ {1, . . . , k} and 1 < j < £ where for consistency we assume that the source S and the destination T are defined as nodes 
No and Nit, i € {1, . . . , k}, respectively. 

For the case of routing with retransmissions, the source S divides the n packets between the different paths so that the time 
taken to send all the packets is minimized in expectation. This is accomplished by having the number of packets that cross 
each path to be proportional to the capacity of the path. Indeed, if the source S sends nx, ■ . .,rik number of packets though 
each path then according to Theorem Q] the expected time to send these packets is 1 ^' pi . + D ni , i € {1, . . . , fc}, where D ni 
are bounded delay functions. The values rii are chosen so that the linear terms of the above expected values are equal, i.e. 
, ni = . . . = i " fc and nx + ■ ■ ■ + nh = n. Therefore the choice of 

ni = ^^,ie{l,...,k} (9) 

k - ^Pii 
i=l 

minimizes the expected time to send the n packets. Therefore from now on, when routing is performed, source S is assumed 

5 If the network was not reaching a steady state the above limit would diverge. 
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Fig. 2. Two parallel multi-hop line networks having links with different erasure probabilities 



to send — Pn)/(k — J2i=i Pn) over eacn P atn * - 



A. Coding Strategy 



Ql 





Fig. 3. A network of fc parallel erasure links with erasure probabilities qi, ■ ■ ■ ,Qk connecting source 5 and destination T. 

Before we analyze the expected time ET£ taken to send n packets through the network in Figure [2] using coding (where the 
c superscript stands for coding), we prove the following proposition that holds for the simplified network of k parallel erasure 
links connecting the source to the destination as in Figure [3] 

Proposition 2. The expected time ET^ taken to send by coding n packets from source S to destination T through k parallel 
erasure links with erasure probabilities qi , . . . , (ft respectively is 



IT' = 



B n 



k - Ei=i * 

where B n is a bounded term. This relation holds regardless of any statistical dependence between the erasure processes on 
different links. 

6 To simplify the notation we will assume that all numbers n(l —pn)/(k — 5Zi=iPii) are integers. Our results extend to the case that those numbers are 
not integers by rounding them to the closest integer. 
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Proof: We define Aq, Ai, . . . , to be the probabilities of having 0,1, ... ,k links succeed at a specific time instance. 
The recursive formula for ET,J is: 

ET n = A -(Efc + l)+A 1 -(Efc_ 1 + l) + ... + A k -(Ef c n _ k + l) 
&(l-A )-ET n = A 1 -ET n - 1 + ... + A k -Ef n _ k + 1 (10) 

where ET m = for m < and the last term in ( TTOb is obtained from the relation J2i=o ^« = 1- 

The general solution of ( fT0T > is given by the sum of a homogeneous solution and a special solution. A special solution for 
the non-homogeneous recursive equation ( fTOb is linear D ■ n where after some algebra D = l/(Ai + 2A 2 + . . . + kA k ), which 
is the inverse of the expected number of links succeeding in a given instant. Therefore D = l/(k — X}*=i 1i)> independent of 
any statistical dependence between erasures on different links. 

The homogeneous solution of linear recurrence relation with constant coefficients (TToT > can be expressed in terms of the roots 
of the characteristic equation p(x) = (1 — Ao)x k — Aix k ~ 1 — . . . — A^ ||2T1 Section 3.2]. We will prove that the characteristic 
equation has x = 1 as a root and all the other roots have absolute value less than 1. Indeed since Aq + . . . + A k = 1 => 
(1 — Aq) — Ai — . . . — A k = 0, therefore x — 1 is a root of p(x)\ now assume that x = 1 is a multiple root of p(x). Then 



& 


k(l 


-A )-(k-l)A 1 -.. 


■ - A k ^ = 




fc(l 


-A )-(k-l)A 1 -.. 


. — (ft - (ft - l))Afc_! = 




k = 


k(A Q + A x + . . . + A k „ 


_ l )-A l -2A 2 -...-(k- 




k = 


fc(l - A k ) -A x - 2A 2 


- . . . - (ft - l)i4*_i 




k = 


k-(A 1 +2A 2 + ...+ 


kA k ) 




= 


A 1 +2A 2 + ...+ kA k 






k = 


Pi + P2 + ■ ■ ■ + Pk 





This implies that all links fail with probability 1, which contradicts the assumption from Section [TT] that no link fails with 
probability 1. Assume now that characteristic equation p(x) has a complex root x = r-e 1 '^ where |x| > 1 or equivalently r > 1. 
Define f(x) = x k and g(x) = A$x k + A x x k ~ 1 + . . . + A k then p(x) = is equivalent to f(x) = g(x) but this last equality 
cannot hold since \g(x)\ < \f(x)\ for |a;| > 1. Indeed \g(x)\ < A \x\ k + A^x^' 1 + ... + A k = A r k + Air fe_1 + ... + A k < 
{A + A 1 + ... + A k )r k =r k = \f{x)\. 
Let R = {r : p(r) = 0} be the set of all roots of p(x). The general solution for recursion formula ( fTOb is 



^ = — + £ F jr Jcos(n ■ ft) + G jr pin{n ■ 



We can set 



B n = J2 Fjr]cos(n ■ + J2 G^sinin ■ fc) (11) 



LO 



and since \B n \ < \Fj\ + \Gj\ this concludes our proof. ■ 
Now we are ready to prove the following theorem for the fc-parallel path network shown in Figure [2] 

Proof of Theorem^ As discussed in the proof of Theorem Q] by using the results of lfl6l we can interchange the position 
of the first link of each path with one of the worst links of the path without affecting the arrival process at the receiver T. 
Therefore without loss of generality we will assume that the first link in each path is one of the worst links in the path. Also, 
as in the proof of Theorem Q] for brevity we refer to packets present at a node Ni that are innovative for the next hop node 
N i+ i as innovative packets at node Ni. 

The time T° taken to send n packets from source S to the destination T in Figure [2] can be expressed as the sum of the 
time T° required for all n packets to reach one of nodes N\x, . . . , Nki and the remaining time T° required for all innovative 
packets remaining in the network to reach the destination T, i.e. 

T c n =fc+fc. (12) 

As in the proof of Theorem [T] all quantities in equation (fT2l are random variables and we want to compute their expected 
values. Due to linearity of expectation, 

ET*=ET£ + ET°, (13) 

where by Proposition [2] 

m = — - — + B n (H) 



<il 

i=l 

where B n is bounded. This holds regardless of any statistical dependence between the erasure processes on the first link of 
each path, and the remainder of the proof is unaffected by any statistical dependence between erasure processes on different 
paths. 

The time ET 1 ^ required to send all the remaining innovative packets at nodes Nij (i E {1, . . . , k}, j <G {2, ...,£— 1}) to the 
destination is less than the expected time Ef it would have taken if all the remaining innovative packets were returned back 
to the source S and sent to the destination T using only the first path. Let Rij denote the number of remaining innovative 
packets at node Nij at the moment the n' h packet has arrived at one of the k nodes N%i, . . . , Nki- Then the total number of 

k 1-1 

remaining innovative packets R is R = Rij and the expected time Ef is upper bounded by 



i=l o = l 



Ef =E[E(f|.R)] < V — . (15) 
where Ei?/(1 — p\j) is the expected time taken for R packets to cross the j th hop in the first path. 
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By combining the fact that ET 1 ^ < Et with equations ( fT3l ) and ( TBI we get 



ET* = +D* n (16) 

i=l 



where is upper bounded by 



Dl < B„ 



A ER 



By Proposition Q] the number of remaining innovative packets at each node of each path is a stochastically increasing random 
variable with respect to n. Therefore, the expected number of remaining packets is an increasing function of n. Consequently 
one can find an upper bound on ERij by examining the line network in steady state, or equivalently, as n — > +00. For the 
case where the first link of each path is the unique worst link of the path, as shown in fl9l . each line network will reach 
steady state and consequently E(R) <E 0(1). If there are multiple worst links in at least one path, then ER <G 0(^ l /n). This 
can be seen by interchanging the positions of links such that the worst links of each path are positioned at the start. By the 
results of ifTUl , the number of innovative packets remaining at nodes positioned between two such worst links is 0(y/ri). By 
the results of lfl9l . the number of innovative packets remaining at other intermediate network nodes is 0(1). 

Substituting pu with max pij for i € {1, . . . , k] in equation ( [Tol l concludes the proof. ■ 

B. Routing Strategy 

In this section we analyze the expected time ET r n taken to send n packets through the parallel path network in Figure [2] 
using routing (where the r superscript stands for routing). We first prove the following two propositions. 

b 

Proposition 3. For a,b,ci,C2 £ N + with a < b the sum y is equal to: 

z -—' c 2 + to 



tn=a 



I, 



V Cl : m = a - b - 1 + ( Cl + c 2 ) {H C2+b - i? C2+a _i) (17) 

^ — ' i-n 4- rn. 



m—a 



C 2 + m 



1 

where H n is the n Harmonic number, i.e. H n = 7 — 
Proof: 



1 

i=i 



b b , b b 

ci - to \ - 1 \- m ^ m 

y, — ; — = c i y. — ; — y. — ; — = ci {H C2+b - # C2+a _i - y, 



c 2 + m ' C2 + m ^ c 2 4- m ^ c 2 4- m 

m—a m—a 



(18) 



b 

m 



Where can be evaluated as follows: 

Z — ' C 2 + TO 



m—a 



h ,1 V- C 2 + TO 

o — a + 1 = y 



c 2 + TO 
m—a 



^■b 



i = c 2 v — - — + y 



b 

TO 



C 2 + TO Z — ' C 2 + TO 
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E 



c 2 + m 



— b — a + 1 — c 2 (H C2+ b — if C2+a _i) 



(19) 



So from equations (fT8l l and ( fT9] l we conclude that: 



E 



ci — m 
c 2 + m 



a - b - 1 + (ci + c 2 ) (i? C2 +h - H C2+a _i) 



Consider the network of Figure [3] with fc = 2 parallel erasure links. As shown in equation (O in order to minimize the 
expected completed time the routing strategy sends ^ packets over the first link and ^-q^-J? packets over the second 
link. Proposition 2] examines this expected transmission time under routing. 

Proposition 4. The expected time ET^ taken to send by routing n packets from the source to the destination through two 
parallel erasure links with probabilities of erasure qi and q 2 respectively is 



ETL = 



" 2 - qi - q 2 



where U% 1,q2 is an unbounded term that grows at least as square root of n. The term routing means that out of the n packets, 
2-qi-g2 P ac ^ s are transmitted through the link with qi probability of erasure and 2~^~i|~ packets through the link with q 2 
probability of erasure. 



Proof: Denote by Aij the expected time to send i packets over the link with erasure probability qi and j packets over 
the link with erasure probability q 2 . Clearly ET£ = Aij with i — " ( - 1 ~ 91 - > 
dimensional recursion formula: 



2— qi— qi 



n(l-q 2 ) 
2— qi— q2 ' 



> J = 2-m -no • Ai,j satisfies the following two 



or equivalently 



= qx<h[Ai,3 +!] + (!- '/i :■'/-> l ij + 1] 
+gi(l - <h)[Aij-i + 1] + (1 - - 92)[A-ij-i + 1] 
Ao = t^T: A 0tj = J ^, A Q ,o = 

(1 - <7i<7 2 )A M = (1 - qi)q 2 A l - 1J + q±(l - q 2 )A iJ ^ 1 
+(1 - qi )(l - q 2 )A i - lij - 1 + 1 



A . 



Aq.o = 



' "" u ^ 1-92' 

The two dimensional recursion formula in ( |20b has a specific solution tttt 1 — r + ^rrr- — r 

1 ' F 2(l-?i) 2(l-g 2 ) 

(1 - qiq 2 )B lJ = (1 - qi)q 2 Bi- lt j + qx(l - q 2 )B. l . :j - 1 
+ (1 - <7i)(l - q2)Bi-i,j-i,i,j > 1 

Bi.o 



(20) 



and a general solution fij where 



2(l-9i)' 



(21) 



In order to solve equation (fJTJ we will use the Z-transform with respect to i. More specifically we define the Z-transform 
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TABLE II 

Some pairs of functions along with their Z-transforms 

Sequence Z-transform 

1 



1 



i+j-t-1 



1 - z 

z 



(b - z y 



(1-zf 
.t 

, for t < j 



i=0 

By multiplying all terms in equation (fJTJ by z % and summing over i we get: 

oo oo oo 

(1 - 9i92) B ij ■ z * = ( 1 - Si)® Bi ~^ ■ ^ + 9i(! - 92) X! 

i=l i=l 

OO 



i=l 



i=l 



(1 - 9192) 



= z(l - qi)q 2 B ZJ +qi(l- q 2 ) £ z ,j-i - -B ,j-i 

+z(i-ft)(i-ft)j5,j_i. 



Since B j 



2(1-92) 



the above equation becomes: 



[(1 - qi 92) - z(l - qx)q 2 ] -B 2J = [<?i(l - q 2 ) + z(l - <7i)(l - q 2 )\ B Zij _ 1 

I ,■ !-gi 4. 23. 

+J 2(l-, 2 ) ^ 2 
£>z,0 = Z^=0 a i$ Z = 2^i=0 2 (l- qi ) Z = 2(l- 9 i)(l-z) 2 

where equation (f23l > is an one dimensional recursion formula with the following general solution 12T1 Section 3.2]: 



B 



~ (l-q 1 )(l-z)2 
3 



qi {\ - q 2 ) + z{\ - qi ){l - q 2 ) 



1 - gi<?2 - z(l - qx)q 2 

z 



2(l-(ft)(l-z) 2{l- qi ){l- zf 



(22) 



(23) 



(24) 



Equation ( 124b can be written in a compact form 



B z j = a(z) ■ b(J, z) + d(j, z) 



by defining the functions a(z), b(z,j) and d(z,j) as follows: 

a(z) 



(i- qi )(i- z y 



(25) 
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b(z,j) 



9i(l - 92) + z(l - gi)(l - q 2 ) 



1 - qiQ2 - z(l - qi)q 2 

j z 



2(l-ga)(l-*) 2(l- gi )(l-z) 2 



Now we are ready to compute the inverse Z-transform of B z ,j- Using Table HI1 along with equation d25l i: 

B itj = Z- 1 {a(z) ■ b(z,j)} + Z- 1 {d(z,j)} 

i 

* B U = E a ^ - m ) ' 6 ^ + 2(13^) - 2(1^) 

where a(i) and b(i,j) are the inverse Z-transforms of 0(2) and b(z,j) respectively. From Table HIW^) = jz^ an d therefore 
the equation above becomes 



m=0 



2(1 -ga) 2(1 -ft) 



(26) 



The remaining step in order to compute B^j is to evaluate b(i,j): 



b(i,j) = Z~ 



q 1 (l-q 2 ) + z(l-q 1 )(l-q 2 ) 



1 - qiq 2 - z(l - qi)q 2 

( . 



[(1 -qi)q 2 ] : 



■z~U 



z t (l-q 1 ) t (l-<h) t [QiO--<h)Y- t 



\ 



1-9192 
(l-9l)92 



J 



gi(l-g 2 ) l J 'y^ 
92(1 -ei) J ^ y i 

(gi(l-g 2 )) 3 "((l-gi)ga) 

(1 - qi q 2 y+i 



1-gi 
9i 



9192 



9l)92 



E 

t=0 



i + j - 1 - 1 

3-1 



l - gi q 2 
qi 92 



Therefore equation d26| ) becomes 



ft 



gi(l - 92) 



EEf 



m=0 t=0 



9l 



(1 - gl)g2 



1 - 9l92 



1 - giQ 2 

and since the expected time Aij = Bij + 2 (i~ qi ) + 2(i-q 2 ) 



m + j-t-1 \ (\ -q x q 2 
3-1 



9i 92 



2(1 -a) 2(1-9!) 



then 



9i(l ~ 92) 
1 - 9192 



j i 



EEf 

m=0 t=0 



91 



(1 - 9l)92 



1 - 9192 



m + j-t-1 1/1- q 1 q 2 \ 



3-1 



9i92 



7 



1 - 92 



(27) 



We are interested in evaluating ET^ = A;j for i = 2- qi -q 2 an( l 3 = 2-L -oi therefore from equation d27l i we get 



Ell = 



2-91-92 



2-91-92 



C/91,92 
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where 



i(l-9l) n(l-q 2 ) 



9i (1 - 12) 



l - 91 92 



(1 ^ 2> 2- 91 -g 2 2-g 1 -9 2 n(l-qi) 



2 — 91 — Q2 



E E 

m=0 t=0 



2— gi-g 2 



1-91 



(1 - 9l)92 



1 - gi<?2 



m I n(l-q 2 ) 
2-qi-q 2 

t 



m + " (1 ~ g2) - t - 1 

2-91-92 

"(1-92) _ 
2-91-92 



'1 -giga V 
9i 92 / 



with 



= if to < w. If we define = iL_2ii21 £; = <?1 ^ 1 and F = 1 qiq2 then the above expression can be 

1-9192 1-9192 9192 r 



W 



written more compactly as 



U qi 



"(l-9l) n(l-q 2 ) 
, 2-31-52 *-91-«a "(l-gl) tv, 



m=0 t=0 



1-91 



"(1-92) 
2—91—92 



V 



t 




In order to prove that function U qi - q ' 2 is unbounded we will prove that U qi ' q2 is larger than another simpler to analyze function 
that goes to infinity and therefore U qi,q2 also increases to infinity. Indeed the equation above can be written as 

/ 71(1-92) 



, 2- 91-92 2-91-92 '"■(i.-qi) 

E g&% £ E 



m=0 t=0 



1-91 



2-91-92 
t 



I 11(1-92) 
2- 

V 



2-91-92 

11(1—92) 
2-91-92 



m — t 



"(1-92) 
2-91-92 



"(1-92) 
2-91-92 



-W m F t 



i(l-9 2 ) 



> 



n(l-g 2 )E- 
(l- qi )(2- qi -q 2 ) 



"(l-9l) "(1-92) 

2-91—92 2-9!— 92 

E E 

m=0 t=0 



"(I-92) 
2-91-92 



2-91-92 



2-91-92 



V 



"(1-92) 
2-91-92 



"(1-92) 
2—91—92 



TO — t 



-W m F t 



and since all terms in the above double sum are non-negative we can disregard as many terms as we wish without violating 
direction of the inequality, specifically 



£791,92 > 



n{l-q 2 )E-- 



''(1-92) 



(1- 91X2-9! -g 2 ) 



E 

me J,teG 



( "(1-92) 
2-91-92 



V 



t 



"(1-92) 
2-91-92 



TO — t 
"(1-92) 



"(l-9l) 
2 — 91—92 



2-91-92 



»(l-92) +m 
2-91-92 



W m F t 



(28) 



where J = {[I^(l-i)l J ... J i^},G = {r(l-9 1 )^^(l-^)l ) ...a(l-9 1 )|^j}and [x\, W are 



the floor and the ceiling functions respectively. 



By using the lower and upper Stirling -based bound 12211 : 



one can find that 



and 



27m ( — ) < n! < ^/2^m [ — ] e i2 " , n > 1 



/ 



>- 



0„ I VW(1 - /3)n 



• 2"- f/(,3) • e" 12 "W-f), /3 G (0, 1) 



> 



2tt(/3- l)n 
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or 



Fig. 4. The region N where function g(a, 0) is defined on. 



where H(f3) = — /31og 2 (/3) — (1 — /3) log 2 (l — (3) is the entropy function and therefore using inequality d28l ) we can derive: 



n(l — q±) 

V yi; meJ,teG 2-91-92 + v 7 



where M = T = /(a, = / ^ggfg^ . *(a, = ^gi^ + ^ and 



<L-«L i og2 (WO + (/?) + (1 + ai— ^ 

1-92 1-92 \ 1 + a\-^ - P 



g(a, (3) = log 2 (E) + log 2 (W) + H((3) + (1 + - l3)H [ - : ^ ] + /31og 2 (F). 



Since 1 — < < 1 and (1 — q{) — -j= < ^ < (1 — qi) we define functions f(a, (3), h(a, (3) and g(a, f3) within the region 
1 — qi — -j= , 1 — qi . Moreover we are only concerned with large enough n so that < (3 < a and 



N 



region AT looks like the one in Figure [4] For large values of n, f(a, (3) > \J 2 q 1 (i- q 7) an< ^ /^) < 1 + {i-qi)q\ qi(i-gi) 
within region A~ and therefore from inequality ( f29T > we get: 



i(i-?0 

a-gi-Sa ri ' 2 ( 1 -'?2) i 2 > ^-^ „ m »(i-<ra) i« t\ 

U<li.,<ll > e ~ 12r.(l-<, 2 ) ( L + (l-n) g2 + 91 (1-91) > \ ^~ <?1 ~ <?2 2 2 -H-<!2 5 lM'Tj 

V8^i(i-9i) 3 m ir 6G S^ + - 

_i qi) 

for large enough n. 

Function g(a,/3) satisfies the following three conditions: 



l) |£ = i^ai i og2 f w aSi - g i)+d -ffl(i- ? 2) A and n = , / 

' 9a 1-92 \ a(l-gi)-/3(l-92) / of) b * \ /3[a(l-<3i) + (l-/3)(l-g 2 )] J 

o\ d 2 g _ _ (1-gi) 2 s n 

dZ 1 [a(l-q 1 )-li{l-q 2 )\[a(l-q 1 ) + (l-(3)(l-q 2 )\\n2 ^ U 

^ d 2 g d 2 g _ _d^g_ _d^g_ _ (l- gi ) 2 „ 

J > ' W 1 dad(3 ' dflda ~ 0(l-0)[a(l-q 1 ) + (l-0)(l-q 2 )\ [a(l-<?i )-0{l-q 2 )\(ln 2) 2 ^ u 



It's easy to see from condition 1 that 



dg(a,/3) 
dot 



= o and 



= 0. Moreover conditions 2 and 3 show the 

(l,l-9i) 
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concavity of g(a,j3) within region N and along with condition 1 it is proved that function g(a,/3) achieves a maximum at 
point (a, 0) = (1, 1 — q\). Therefore g(a, j3) < g(l, 1 — q\) = making the exponent of 2 in ( f30b non-positive guaranteeing an 
exponential decay of each term in the sum. Since region N is compact (closed and convex) and function g(a,f3) is concave, 
and therefore it will achieve its minimum on the boundary of N. It's not difficult to show that ?iLE^—SM > o f or a < 1 and 
therefore function g(a, 1 — qi) decreases in value from point I to point IV. Similarly d9 ^^ > for j3 < 1 — q\ and therefore 
function g(l,(3) decreases in value from point I to point II. Since dg^i-q^-i/v 7 ") -> q f or a < ^ and a g(i-iA/".ft) > o 
for P < 1 — q\ with similar arguments as above we show that the minimum value for g(a, (3) within N is achieved at point 
C = (a m , (3 m ) = (1 - 1 - qi - Therefore g (|, ^) > g (a n ,0 m ) or else from equation (O: 

J75l,« > e (l- q 2 )0l 2 jjlz^ g(Qm ,ff m ) 2 - gl -, a ™ 

(2- qi -q 2 W8^ qi (l- qi ) ) £ tj ^- +m 

Using the Taylor expansion of function r(x) = g(l — x, 1 — q\ — x) around i = flwe get the following expression: 

/( , )= g f( g2 - gl )- g2 (l-gf) 2 + 0( , 3) . 
(1 - qi)qiq2{l - qiq 2 )\n2 



For x = -7= we get 



n(l-ft) , „ , (1-92) (g?(ga -«i) -92(1 -9?)) f 1 

-5 («m, /3m) = 7^ r \~j 7; + 



2-91-92 ' (2 - 91 - q 2 )(l - 9i)9i92(l - 9192) In 2 VV™ 

where along with Proposition [3] we get 

e~ l (l — Go) /n Cl-92)(9?(92-9l)-92(l-9?)) 
(/91.92 > V 9 (2-91 -9 2 )(l-9l)91 92(1-91 9 2 ) 1" 2 + 7S/(n.) (31) 

{2- qi -q 2 )^^ qi {l- qi ) 

where t(n) = n (H n — -H„_fc(„)_i) — — 1 and fc(n) = A^fn with ^4 = 2 ^^q 2 ■ The above expression can be simplified 
by using the bounds proved by Young in |23l : 

In n + 7 + , 1 | -r < H n < In n + 7 + — 
2(n + 1J 2n 

where 7 is the Euler's constant. We obtain from OTb : 

e^d-Q^Jn (l-92)(9?(92-9l)-92(l-9;)) c 

[/<?1^ > i 9 (2-91 -92)(l-9l)91 92(1-91 92) l°2 + 7^r/)(n.) (32) 

(2-91-^)^8^1(1-91) 

where <j>{n) = n In ( ^fo^ ) - 2(n " +1) ^^pi ~ - 1- » can be easily proved that function w(n) = n In ( — jfep? ) - 
fc(n) — 1 is greater than 4j- for n > 1. Indeed 

uj"{n) = A( A r n2 3/9 > for n > 1 ( 33 ) 

4(n — A^/ji — l)^n ,:i '' i 

and since lim uJ (n) = it means that u/(n) < for n > 1 and therefore aj(n) is a decreasing function of n > 1. Moreover 

n— s-+oo 

In f u\ i ) ~ — •, fc(n) fc 2 (n) 2 ,2 
t / \ ,. \n-k(n)-l J n L Hospital -=!T i ^2 rr /l 

lim win) = lim — = — 1 = lim — ^- V - — - 1 = — 

n-++oo n->+oo A n^+oo -Jj.(2 + 2fc(n) - 2n) 2 



and therefore ui(n) > =4- for n > 1. Finally inequality (|32] i becomes 



(2-gi - g 2 ) v /87r2 '?i( 1 " 9i ) \2V 2 -9i-?2/ 2(n + 1) n - fe(n) - 1J ' 

Clearly the above function is unbounded and U% 1,q2 increases with respect to n at least as y/n. ■ 
Now we have all the necessary tools to prove the following theorem for fc-parallel path multi-hop networks as shown in 
Figure [2] 

Proof of Theorem^ Without loss of generality due to ji6| we can interchange the first link of each of the k line networks 
with the worst link of the line network. The first term in equation (f2]i is due to the capacity of the k parallel multi-hop line 

network. The second term D r n is sub linear in n; what is left to prove is that term D T n grows as fi(-y/n). This follows from 

fc 

Proposition @] The number of packets transmitted on the first two paths is n\ = n\l — max piij j {k — max pyj and 



ri2 = nyl — max p2?j j [k — max pijj respectively. The time r,'j taken to send n packets through the fc-parallel path 

i<i<e i<j<( 

multi-hop network is greater than the time T l n taken for n\ packets to reach node N\i and n-2 packets to reach node N21. 
Therefore from Proposition |4] 

TUT-ir ^ n , rr max i<i<» pii,max 1<j<e p 2 j 

> 1 1" U n> ~ ~ 



k — > max pa 

^— 1 Kj<t J 



, i<j: 

2 — 1 

k 

where n' = n\2— max pu— max p2ij j (k — max p^ is proportional to n. By Proposition|4] [/™ aXl S'< f 
grows as £l(\friJ). Thus, D T n grows as £l(y/n) 



V. General network topologies 
We next consider networks with general topologies. 

Lemma 1. In a single-bottleneck network, there exists a max-flow subgraph comprising paths each of which has a single 
worst link. 

Proof: Given a network Q = (V,£) with a single minimum cut, let (vi, Wi), . . . , (vk, Wk) be the edges crossing the 
minimum cut. Let Q' be a max flow subgraph. Consider the network Q — Q' obtained from Q by reducing the capacity of each 
link G £ by the capacity of the corresponding link in Q' if any. There is a path from the source to each node Vi, 1 < i < k 
(which may not all be distinct), otherwise this would contradict the assumption that there is a single minimum cut. Thus, we 
can find a subgraph Q" comprising a set of paths of nonzero and nonoverlapping capacity from the source to each distinct 
node 1 < i < k. Similarly, we can find a subgraph Q'" comprising a set of paths of nonzero and nonoverlapping capacity 
from each distinct node 1 < i < k, to the sink. We can then decompose the union of subgraphs Q' + Q" + Q'" (obtained 
by adding the capacities of corresponding links) into a sufficiently large number of paths each of which has a single worst 
link corresponding to the min cut of the original network. ■ 

Proof of Theorem |?} The expected time ETJj required to send all n packets by routing through network Q from source 
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Ab 

(c) 

Fig. 5. (a) Network Q with a single source S, a single destination T, an intermediate node A, and four erasure links 1, 2, 3, and 4 with probabilities of 
erasure 0.5, 0.4, 0.8, 0.9 respectively, (b) The solution of the linear program on network Q would give us three rates Ai = 0.5, A2 = 0.2, and A3 = 0.1. 
(c) Network Q derived from the solution of the linear program 

S to destination T is greater than the time EX^'j it would take the n packets to cross the mincut of the network by routing. 
Specifically if we assume that all nodes on the source's side of the cut are collapsed into a super source node and all nodes 
on the sink's side of the cut are collapsed into a super destination node then the network becomes a parallel erasure links 
network as shown in Figure [3] Then 

ET n >Er n = ^ + D\ 

where D r n € ft(y/n) by Theorem [3] 

For the case of coding on a network Q, for any max-flow subgraph (composed of flows on paths from source S to destination 
T), one can construct a parallel path network Q that requires at least as much time to send the n packets from the source to 
the destination. 

Denote by T the set of source-sink flows in the max-flow subgraph. For each flow / 6 T, let A/ denote the flow rate and 
let Vf denote the path of flow /. For each node v e V in network Q, let JC V denote the set of flows passing through node v, 
where ICs and ICj- are equal to the sets of all flows in network Q. For each edge e <G £ let T e denote the set of flows passing 
through edge e. For the example in Figure EJb), T = {1,2,3}, Ai = 0.5, A 2 = 0.2, A 3 = 0.1, Vi = S ->• T for flow 1, 
V 2 = S A T for flow 2, and T 3 = S ->• A ->• T for flow 3, K A = {2, 3}, and T x = {1}, T 2 = {2, 3}, T 3 = {2}, and 
Ta = {3}. 

The process of creating network Q = (V,£) from Q is the following. 

1) For every node v € Q, create a set of nodes V v = {if : f G IC V }. The set of nodes V is defined as V v . 

2) The edges of network Q are created as follows. For each flow / e T and for each edge (u,v) in path Vf of flow /, 
create an edge in network Q from Uf to if with probability of erasure 

P(uffij) = 1 v-^ , 
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where pr u ,v) is tne probability of erasure of link (u,v) in network Q. Define a function H((uf,Vf)) = ^ — — — — . 

3) Collapse all nodes of set Vs to a single node 5 that denotes the source in network Q, and collapse all nodes of set Vj- 
to a single node T that denotes the destination in network Q. 

The process above splits every node v € V into K v separate nodes and splits every edge e E £ into |.F e | separate edges. The 
sum of capacities of all edges that edge e is split into is equal to the capacity of edge e. The result of applying this procedure 
to network Af of Figure [3b) is shown in Figure |3c). In network Q erasure events on different links are not independent but 
correlated as follows. For every edge (u,v) € £, denote by Ct u ,v) = {(u,v) & £ : v & IC u ,rh £ K. v } the set of edges in Q 
that are derived from edge (u, v) <G £. The erasures on all edges in set Ci u ,v) are not independent but correlated as follows. 
At each time step, with probability 1 — %>( u ,v) one edge in set C(u,v) succeeds, or all fail with probability pr UjV \. In the case 
of a success, edge e <S C( u ,v) i s me single successful edge with probability H^. 

The time taken for the n packets to travel through network Q by coding is at least as large as the time taken in 
network Q, i.e. 

ET^ < Ef=. (34) 

Indeed network Q can be emulated by network Q if each node v E Q has \K V \ different buffers and packets between different 
buffers are not mixed. By construction, networks Q and Q have the same capacity and since Q is a parallel path network, the 
mincut of network Q passes through the worst link of each path. According to Theorem [2] 

ET« = — +D c n (35) 

where D c n € 0(y / n) when there are multiple worst links in at least one path or D c n G 0(1) when there is a single worst link 
at each path. For a single-bottleneck network, by Lemma [T] one can construct a max-flow subgraph comprising paths each of 
which has a single worst link, so D c n £ 0(1). Equations d34l l, ( f35T > conclude our proof. ■ 

VI. Proof of concentration 
Here we present a martingale concentration argument. In particular we prove a slightly stronger version of Theorem [5] 

Theorem 6 (Extended version of Theorem |5). The time T,'j for n packets to be transmitted from a source to a sink over a 
network of erasure channels using network coding is concentrated around its expected value with high probability. In particular 
for sufficiently large n: 

P[|T--ET-|>e n ]<gg+ 2 2Cn l S +2S . 

n n z — n 1+zo 

where C is the capacity of the network and e n represents the corresponding deviation and is equal to e n = n 1 / 2+A jC, 
5 € (0,1/2). 

Proof: The main idea of the proof is to use the method of Martingale bounded differences J24). This method works as 
follows: first we show that the random variable we want to show is concentrated is a function of a finite set of independent 
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random variables. Then we show that this function is Lipschitz with respect to these random variables, i.e. it cannot change its 
value too much if only one of these variables is modified. Using this function we construct the corresponding Doob martingale 
and use the Azuma-Hoeffding |24| inequality to establish concentration. See also l25l . Il26l for related concentration results 
using similar martingale techniques. Unfortunately however this method does not seem to be directly applicable to T% because 
it cannot be naturally expressed as a function of a bounded number of independent random variables. We use the following 
trick of showing concentration for another quantity first and then linking that concentration to the concentration of T£. 

Specifically, we define R t to be the number of innovative (linearly independent) packets received at the destination node T 
after t time steps. R t is linked with T£ through the equation: 

T° = arg(i? t = n). (36) 

t 

The number of received packets is a well defined function of the link states at each time step. If there are L number of links 
in network Q, then: 

R t = g(zu, zi L , z t i, ztLj- 

The random variables Zy,l < i < t and 1 < j < L, are equal to or 1 depending on whether link j is OFF or ON at time i. 
If a packet is sent on a link that is ON, it is received successfully; if sent on a link that is OFF, it is erased. It is clear that 
this function satisfies a bounded Lipschitz condition with a bound equal to 1: 

\g(z n , z 1L , Zij, z t \, z tL )- 
g(zn,...,z 1L ,...,z ij ,...,z t i,...,z tL )\ < 1. 

This is because if we look at the history of all the links failing or succeeding at all the t time slots, changing one of these 
link states in one time slot can at most influence the received rank by one. We note that we assume that coding is performed 
over a very large field to ensure that every packet that could potentially be innovative due to connectivity, indeed is. 

Using the Azuma-Hoeffding inequality (see the Appendix Theorem |7j on the Doob martingale constructed by R t = 
g(zn, zil, z t i, z t h) we get following the concentration result: 

Proposition 5. The number of received innovative packets Rt is a random variable concentrated around its mean value: 

PflJZt - ERt\ > e t ) < ^ where e t = J^n^t). (37) 



Proof: Given in Appendix IB] ■ 
Using this concentration and the relation ( f36b between T!^ and R t we can show that deviations of the order e t = ^tn{2t) 
for R t translate to deviations of the order of e„ = n 1 ^ 2+s /C for T^. In Theorem [6] smaller values S give tighter bounds that 
hold for larger n. Define the events: 

H t = {\Rt-ERt\<e t } 
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and 

H t = {\R t -ER t \>e t } 

and further define (u stands for upper bound) to be some i, ideally the smallest t, such that ERt — St> n and v n (I stands 
for lower bound) to be some t, ideally the largest t, such that ERt + Et < n. Then we have: 

nK>t%) - nK>t u n \H t ,)-p(H t ,j 

+ F(TZ>QH^)-F(H t u) 

where: 

• F(T% > = since at time t = t„ the destination has already received more than n innovative packets. Indeed 
given that H t ^ holds: n < ER t u — e t « < R t u where the first inequality is due to the definition of i". 

. F(H K ) < 1 

. P(T* > tl\H t u) < 1 

. F(H t u ) < i due to equation fl37l ). 

Therefore: 

P(T^ > < 1. (38) 

Similarly: 

P(2S>0 = P(T«>4|if t ,)-P(ff 4 ) 
+ P(T«>4|^)-P(F t ,) 

where: 

• P(T^ < t l n \H t i ) = since at time t = t l n the destination has already received less than n innovative packets. Indeed 
given that H t i holds: R t ^ < ER t ™ + £t% <n where the last inequality is due to the definition of t l n . 

. F(H tl J < 1 

. F(T'<t l n \H tl J<l 

. F{H t iJ < ±- due to equation ( f37b . 

Therefore: 

HT c n < 4) < 4. (39) 

Equations (l38l and d39l ) show that the random variable T° representing the time required for n packets to travel across 
network Q exhibits some kind of concentration between t n and t™, which are both functions of n. As shown in Lemma [2] in 
Appendix iBl for large enough n a legitimate choice for v n and £^ is the following: 

t« = (»+ n 1 /^')/^ 5' e (0, 1/2) (40) 
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4, = (n - n^+fyc, 5' g (0, 1/2) (41) 



From both (I38ll and 



K<<U<%) = 1-P(^<*n)-P(^>*n) 

1 1 
+/ +u 

and by substituting in d42b the i™, from equations (t40b and d4TT > we get: 

n i/a+«' „ „i/a+a' 
Pf < T c < ) > 1 

I c - n c - A ) - 

c c 



1 1 

> l-TT-TT (42) 



n - 77 1 /2+<5' n + n l/2+5> 

and since ET„ = § + 0(y / H) we have: 



n 



V2+« 2C 2Cn 25 



or 



C n n 2 - n 1+215 



P(|T^-ET^|>-^)< 

where 6 > 6' and this concludes the proof. ■ 

Appendix A 
Proof of Proposition Q] 

Definition 1. A binary relation X defined on a set P is called a preorder if it is reflexive and transitive, i.e. Va, 6, c G P: 

a ^ a (reflexivity) (43) 

(a X 6) A (b < c) => a < c (transitivity) (44) 

Definition 2. On the set N 1 of all integer (I — l)-tuples we define the regular preorder -< that is Va, b € N^ 1 a ^ b iff 
O-i < ^l) • • • j < &e-i where a = (oi, . . . , a^-i) a«c/ 6 = (oi, . . . , fe^-i). Similarly we can define the preorder >z. 

Definition 3. A random vector X 6 N £_1 is said to be stochastically smaller in the usual stochastic order than a random 
vector reN w , (denoted by X ^ st Y) if: Va; € N e_1 , P(X ^ w) < P(Y >r w). 

Definition 4. A family of random variables {Y n } ne jq is called stochastically increasing (< st -increasing) ifYk ^ Jf Y n whenever 

k < 71. 

Proof of Proposition\J} Markov process {Y ni n > 1}, is a multidimensional process on E = N e_1 representing the number 
of innovative packets at nodes N±, . . . , Nt-i when packet n arrives at N\. To prove that the Markov process {Y n , n > 1} is 
stochastically increasing we introduce two other processes {X n ,n > 1} and {Z n ,n > 1} having the same state space and 
transition probabilities as {Y n , n > 1}. 
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More precisely, Markov process {Y n ,n > 1} is effectively observing the evolution of the number of innovative packets 
present at every node of the tandem queue. We define the two new processes {X n ,n > 1} and {Z n ,n > 1} to observe the 
evolution of two other tandem queues having the same link failure probabilities as the queue of {Y ni n > 1}. 
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Fig. 6. Multi-hop network with the corresponding Markov chains 



As seen in Figure [6] at each time step and at every link, the queues for {X n , n > 1} and {Z n ,n > 1} either both succeed or 
a fail together. Moreover the successes or failures on each link on the queues observed by {X n , n > 1} and {Z ni n > 1} are 
independent of the successes or failures on the queue observed by {Y n , n > 1}. Formally the joint process {(X n , Z n ), n > 1} 
constitute a coupling meaning that marginally each one of {X ni n > 1} and {Z ni n > 1} have the transition matrix Py of 
{Y n ,n > 1}. If Markov processes {X n ,n > 1} and {Z n ,n > 1} have different initial conditions then the following relation 
holds: 

Xi < Zt X n * Z n (45) 

The proof of the above statement is very similar to the proof of Proposition 2 in lfl8l . Essentially relation (05]) states that 
since at both queues all links succeed or fail together the queue that holds more packets at each node initially (n = 1) will 
also hold more packets subsequently (n > 1) at every node. 

The initial state Yi of Markov process {Y n ,n > 1} is state a = (1,0,..., 0) that is also called the minimal state since 
any other state is greater than the minimal state. To prove Proposition [TJ we set both processes {Y n ,n > 1} and {X n ,n > 1} 
to start from the minimal state (Yi = 6 a , X\ = 5 a where = means equality in distribution), whereas process {Z n ,n> 1} has 
initial distribution /i that is the distribution of process {Y ni n > 1} after (n — k) steps (/i = F Y '~ k 5 a and Z\ ==/i)- Then for 
every w in the state space of {Y n ,n > 1} we get: 



F(X n hu) = F(Y n hu) = P{Z k h u) 



(46) 
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where the first equality holds since the two processes have the same distribution-both start from the minimal element and have 
the same transition matrices-and the second equality holds since 

Z k = P*/z = P* (P£" fe <y = P$5 a = Y n . 

Moreover due to the definition of the minimal element, X\ < Z\ and using d45l ) we get X n ^ Z n . Therefore 

P(Z k hoj)>P(X k hu)=P(Y k huj). (47) 

The last equality follows from the fact that the two distributions have the same law. Equations d46*i i and (|47| | conclude the 
proof. ■ 

Appendix B 
Proof of Proposition [5] 

Definition 5. A sequence of random variables Vq,Vi, ... is said to be a martingale with respect to another sequence Uq,U\, . . . 
if, for all n > 0, the following conditions hold: 

. E[|K|] < oo 

. E[V n+1 \U ,...,U n ] = V n 

A sequence of random variables Vo, V±, . . . is called martingale when it is a martingale with respect to itself. That is: 
. E[\V n \) < oo 
. E[V n+1 \V ,...,V n ] = V n 

Theorem 7. (Azuma-Hoeffding Inequality): Let Xq, X\,...,X n be a martingale such that 

B k < Xk — Xk-i < Bk + dk 

for some constants dk and for some random variables Bk that may be a function of Xq, ...,Xk~i- Then for all t > and any 
A > 0, 

¥(\X t -X \ > A) < 2cxp 



2A 2 



Proof: Theorem 12.6 in ||24| ■ 

Proof of Proposition^ The proof is based on the fact that from a sequence of random variables U\,U2, ■ ■ ■ ,U n and any 
function / it's possible to define a new sequence Vq , . . . , V n 

V = E[f(U 1 ,...,U n )] 
V i = E[f(U 1 ,...,U n )\U 1 ,...,U i ] 

that is a martingale (Doob martingale). Using the identity E[V r |M / ] = E[E[V \U, W] \W] it's easy to verify that the above 
sequence Vq, . . . , V n is indeed a martingale. Moreover if function / is c-Lipschitz and U±, . . . ,U n are independent it can be 
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proved that the differences Vj — V%-\ are restricted within bounded intervals 11241 (pages 305-306). 

Function R t = g(zu, ...,z t x) has a bounded expectation, is 1-Lipschitz and the random variables z^ are independent and 
therefore all the requirements of the above analysis hold. Specifically by setting 

G h = ~E[g(z n , z tL ) | z n , z kr ] 



/i-terms in total 

we can apply the Azuma-Hoeffding inequality on the Gq, GtL martingale and we get the following concentration result 

n\GtL - Go I > A] = r[\Rt - E[Rt]\ > A] < 2cxp{- — }. (48) 

The equality above holds since 
. G =E[R t ] 

• GtL = Rt (the random variable itself) 



and by substituting on d48} A with e t = J ^£n(2t) 



\Rt-E[Rt]\>e t ]<j 



Lemma 2. A legitimate choice for and t n is: 



tl = {n + n 1 / 2+5 ')/C, 5' 6(0, 1/2) 
t l n = (n-n 1 / 2+5 ')/C, 6'e (0,1/2) 



Proof: For any t < n/C, the expected number of received packets MRt is given by KRt = Ct — r(t), where C is the 
capacity of the network and r(t) can be bounded as follows. Letting n t — Ct < n, we have 

E(T< t ) = E(E(TZ t \r(t))) 
= E(t + 0(r(t))) 
= * + 0(r(t)) 

which by Theorem [4] implies that r(t) should be O(^Jnl) < 0(s/n). 

The only requirement for i" is that it is a t such that KR t — e t > n. This is indeed true for large enough n if we substitute 
H with (n + n^ 2+s ')/C: 



^/^ln(2t«)>n 



n + n i/2+S L{n + n^-+ s ') , 2(n + nV2+«') 

g ^ K*n)-y— — 2c ( — g ) - n- ( 9) 
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Since r(t) € 0(y/n) there is a constant B > such that r(t) < B^fn and therefore in order for d49b to hold it is sufficient 
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Similarly it can be proved that t l n can be substituted with (n — n 



1/2+8' y c such that for large n< ERt + et < n . 
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